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METHODS FOR GENERATING AND ANALYZING TRANSCRIPT MARKERS 

TECHNICAL FIELD 

The present invention relates generally to the field of molecular biology and specifically 
to rapid, high-throughpul gene discovery methods that facilitate genome closure and to methods 
5 for analyzing gene expression patterns. 

BACKGROUND ART 
iMany genes have been isolated and their structures determined since the introduction of 
recombinant cDNA technology. It has been predicted that through the efforts of the Human 
Genome Project, sequencing of the entire human genome (genome closure) will be accomplished 

10 sometime between 2001 and 2005 (Boguski, 1995, Molecular Medicine 333:645-647). 
Strategies for achieving genome closure include methods for constructing 3' directional 
normalized libraries which equalize cDNA representation (Soares et al. 1 994, Proc. Natl. Acad. 
Sci. 91:9928 and Patanjali et al, 1991. Proc. Natl. Acad. Sci . USA 88:1943-1947) and methods 
for creating cDNA libraries based on subtractive hybridization techniques which provide methods 

1 5 for isolating the coding sequences of genes which are differentially expressed, such as during 
development or in disease states which are described in Rubenstein ei ai (1990. Nucleic Acids 
Research . 18:4833-4842); Travis et al. (1988, Proc. Natl. Acad. Sci . USA. 85:1696-1700). 

Elucidation of gene expression represents another level of complexity equally important 
to the elucidation of genetic structure. The generation of a gene expression pattern can be used 

20 directly as a diagnostic profile or as a gene discovery method. Seilhamcr et al. (WO 95/20681, 
filed January 27, 1995) disclose methods for the high-throughput sequence-specific analysis of 
cDNAs and generation of transcript images. Matsubara et al (WO 95/14772, filed November 
1 1,1994) disclose methods for generating 3' directed cDNA libraries which accurately reflect the 
abundance ratio of mRNA in a cell. Velculescu et ah (1995, Science 270:484-487) describe a 

25 method for the analysis of gene ''tags" or transcripts which uses type lis restriction enzymes and 
involves the generation and analysis of short, 3' nucleotide sequences which may inherently 
contain a substantial amount of 3* non-coding information. Kato (1995, Nucleic Acids Research 
23:3685-3690) describes a method for the identification of 3* end cDNA fragments which 
involves the use of type lis restriction enzymes and PCR methodology. Kato notes that there are 

30 several technical limitations to the method including the presence of PCR generated artifacts and 
the fact that cDNA sequences lacking enzyme recognition sites will not be displayed. 

Aoto et al. (1995, Eur. J. Biochem. 234:8-15) describe isolation of a cDNA clone 

-1- 
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obtained from a cDNA library of mRNA prepared from cells treated with 5-azacytidine. The 
deoxycytidine analog. 5-aza-2'-deoxycytidine (5-azadCyd), is highly toxic in cultured cells and 
animals and has been used clinically as an anti-cancer agent. 

S-azadCyd has been used experimentally as a DNA methylation inhibitor to induce gene 
5 expression and cellular differentiation (Juttcrmann ct al. 1994, Proc. Natl. Acad. Sci, USA 91 : 
11797-11801). 

In spite of the availability of methods designed to facilitate gene discovery and elucidate 
gene expression, there remains a need in the art for methods which will expedite the process of 
gene discovery in a rapid, high-throughput manner, thereby contributing to the process of gene 

10 closure. There also remains a need in the art for methods for analyzing gene expression patterns 
in a rapid, high throughput manner 

DISCLOSURE OF THE INVENTION 
The present invention relates, in part, to rapid, high-lhroughpul gene discovery methods 
that facilitate genome closure and to methods for analyzing gene expression patterns. The 

1 5 present invention also relates to methods for the rapid, sequence-specific identification of 

transcripts derived from an mRNA population. The present invention further relates to methods 
for extending the nucleotide sequences of partial transcripts in a high-throughput manner using 
polymerase chain reaction technology. 

The present invention provides rapid, high-throughput methods for generating and 

20 analyzing transcript markers from the 5' most end of cDNAs and methods for generating and 
analyzing two, discontinuous transcript markers from a single cDNA providing the advantage of 
obtaining more information from a single transcript than is possible by current methods. In one 
aspect of the present invention, the two discontinuous markers are derived from both the 5' and 3' 
ends of a single cDNA and in another embodiment of the present invention, the two 

25 discontinuous markers are derived from random areas of the cDNA. 

In one aspect of the present invention, a method is provided for generating and analyzing 
transcript markers from the 5' most end of individual cDNAs of a cDNA library comprising the 
steps of obtaining a cDNA library comprising individual cDNAs having a first restriction 
endonuclcase site for a restriction endonuclease that digests the cDN A at the 5* most end within 

30 an expected distance from its recognition site and a second endonuclease restriction site: 

subjecting the cDNAs to digestion with the first and the second restriction endonuclease thereby 
excising transcript markers from the 5' most end of the individual cDNAs: ligating said transcript 
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markers to a vector; transforming said vector containing the transcript markers in a host cell; 
culturing said host cells; and performing nucleic acid sequence analysis of the transcript markers. 
This method can be used for gene discovery purposes or transcript imaging purposes. This 
method can be combined with PCR based technology for the rapid nucleic acid sequence analysis 
5 of the transcript markers. 

In another aspect of the present invention, a method is provided for generating and 
analyzing non-contiguous transcript markers derived from the 5' most and 3' most end of 
individual cDNAs, comprising the steps of obtaining a cDNA library comprising individual 
cDNAs having a first restriction endonuclease site at the 5' most end and at the 3' most end for a 

10 restriction endonuclease that digests nucleic acid within an expected distance from the first 

endonuclease recognition site, and a second endonuclease restriction site; subjecting the cDNAs 
to digestion with the first endonuclease thereby creating linearized cDNAs containing transcript 
markers from the 5' most end and the 3' most end of the individual cDNAs; self-ligaiing said 
linearized cDNAs thereby joining the transcript markers from the 5' most and 3' most ends to 

15 create cDNAs containing non-contiguous transcript markers; transforming said linearized cDNAs 
in a host cell and culturing said host cells; isolating said cDN A from said host cells; digesting the 
cDNA with the second restriction endonuclease thereby excising the non-contiguous transcript 
markers; ligating said excised transcript markers to a second vector; translbrming said second 
vector containing the transcript markers in a host cell and culturing said host cells; and 

20 performing nucleic acid sequence analysis of the transcript markers. 

In another aspect of the present invention, a method is provided for the rapid, sequence- 
specific identification of cDNAs derived from a human mRNA population, comprising the steps 
of obtaining a cDNA library comprising individual cDNAs wherein said cDNAs contain first 
restriction endonuclease sites for an endonuclease having a 4 base pair recognition site and 

25 wherein said cDNAs are cloned into a first vector lacking the first restriction endonuclease sites; 
subjecting the cDNA library to digestion with the first restriction endonuclease thereby creating 
linearized cDNAs containing a portion of the original cDNA; ligating an adapter to said 
linearized cDNAs wherein the adapter contains a second restriction endonuclease site for an 
endonuclease that cleaves within an expected range of its recognition site thereby creating 

30 cDNAs containing two non-contiguous transcript markers joined by the adapter: digesting the 
cDNAs with the second restriction endonuclease thereby excising the transcript markers from 
said cDNAs; concatenating said excised transcript markers; ligating said concatenated transcript 
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markers to a second vector; transforming said second vector containing the transcript markers in 
a host cell and culturing said host cells; and performing nucleic acid sequence analysis on the 
transcript markers in said host cells. 

The present invention also provides novel methods and vectors used in preparation of 
5 cDNA libraries containing transcript markers. The present invention also provides novel, high- 
throughput methods for obtaining complete nucleotide sequence information on transcript 
markers. In one embodiment of the present invention, cDN A libraries containing a serial 
arrangement of multiple transcript markers are constructed and subjected to high-throughput 
nucleic acid sequence analysis in a multi-well format using polymerase chain reaction (PCR) 
10 technology and specialized PCR primers. 

In additional aspects of the present invention, the cDNA libraries are constructed to bias 
the cDNA population toward rare cDNAs. In one aspect of the present invention, a cDNA Iibrar>' 
is constructed by normalization techniques and in another the cDNA librar>' is constructed using 
subtractive hybridization techniques. In yet another aspect of the present invention, the cDNA 
15 library has been constructed from mRNA treated with demcthylating agents, such as, 5-aza-2' 
deoxycytidine and 5-azacytidine, which induces the transcription of silent genes. In an additional 
aspect of the present invention, the cDNA library has been constructed using oligo dT primers 
and in another aspect, the cDNA library is constructed using random primers. 

BRIEF DESCRIPTION OF DRAWINGS 
20 Figure 1 illustrates a general schematic of cDNA librarv* construction. 

Figure 2 illustrates Array of Transcript Markers (ATM) Strategy 1 for constructing a 
cDNA library containing an array of transcript markers derived from the 5' most end of a cDNA 
(5' transcript markers). 

Figure 3 illustrates ATM Strategy II for constructing a cDNA library containing an array 
25 of 5' transcript markers. 

Figures 4A-4B illustrate ATM Strategy III for constructing a cDNA library containing an 
array of 5' transcript markers. 

Figures 5A-5B illustrate ATM Strategy IV for constructing a cDNA library containing an 
array of 5' transcript markers. 
30 Figures 6A-6B illustrate ATM Sjtrategy V for constructing a cDNA library containing an 

array of 5' transcript markers. 

Figure 7 illustrates a strategy for simultaneously obtaining two non-contiguous transcript 

-4. 
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markers from both the 5' and 3' end of cDNA. 

Figure 8 illustrates a strategy for obtaining non-contiguous transcript markers from 
random areas of a cDNA. 

Figure 9 illustrates a sequence specific approach to the identification of all transcribed 
5 genes from an mRNA population. 

Figure 10 is a list of 12 known sequences identified by the method illustrated in Figure 2. 

Figure 1 1 illustrates the toxicity curve for THPl cells treated for 3 days with 5'aza-2'- 
cytidine. 

Figures 12A-12B illustrate examples of Type lis restriction endonucleases useful for 
10 generating ATM libraries. Bpml, Bsgl and Eco57I arc three restriction endonuclease that cleave 
16 bp away from their recognition site with a 2 nucleotide 3' overhang. 'Bprnl is illustrated in 
Figure 12A. N*s represent nucleotides found adjacent to the restriction site (e.g.. the 5' end of a 
cDNA). Treatment with T4 DNA polymerase removes the 3' extension which leaves 14 bp 
derived from the adjacent sequence. Figure 12B illustrates that the restriction endonuclease 
15 BSMFI cleaves 14 bp away from its recognition site with a 4 nucleotide 5* overhang. This end 
may be filled in by treatment with a DNA polymerase which results in 14 bp derived from 
adjacent sequence. 

Figure 13 illustrates a vector useful in the strategy of Figure 8 which has all restriction 
endonuclease sites having a 4 base pair recognition site removed. 
20 Figures I4A-14B illustrate a typical concatenated array of transcript markers. 

Figure 15 illustrates an abundance profile of TUP cells treated with 5-a2a-2' 
deoxy cytidine. The black-shaded area represents control clones, the gray shaded area represents 
5-aza-2' deoxycytidine treated cells and the gray line represents the abundance profile of the 5- 
aza-2' deoxycytidine treated cells. 
25 MODES FOR CARRYING OUT THE INVENTION 

Before the present compounds, variants, formulations and methods for making and using 
such are described, it is to be understood that this invention is not limited to the particular 
compounds, variants, formulations or methods described, as such variants, formulations and 
methodologies may, of course, var>'. It is to be understood that the terminology used herein is for 
30 the purpose of describing particular embodiments only, and it is not intended to be limiting since 
the scope of the present invention will be limited only by the appended claims. 

It must be noted that as used in the specification and the appended claims, the singular 

-5- 
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forms ''a'\ ''an" and ''the" include plural references unless the context clearly dictates otherwise 

which will be known to those skilled in the art or will become known to them upon reading this 

specification. 

1. Definitions 

5 As used herein, the term "transcript marker derived from a cDNA library" refers to an 

isolated polynucleotide derived from an individual cDNA and being preferably from about 10 
base pairs to about 20 base pairs in length derived from an individual cDNA. 

As used herein, the term "non-contiguous transcript marker from an individual cDNA'' 
refers to two polynucleotides which are not adjacent to one another under naturally occurring 

10 conditions, but which are constructed to exist in tandem, with each polypeptide being preferably 
from about 10 base pairs to about 20 base pairs in length. 

As used herein, the term ''3' transcript marker" or "^transcript marker from the 3' most end 
of a cDNA" refers to an isolated polynucleotide derived from the 3' most end of an individual 
cDNA and being preferably from about 10 base pairs to about 20 base pairs in length. 

1 5 As used herein the term ''5' transcript marker" or ^'5' transcript marker derived from a 

cDNA library" refers to an isolated 5' most nucleic acid sequence of a cDNA which is preferably 
about 1 0 base pairs to about 20 base pairs in length. As used herein, '^5' most" means that a 5' 
transcript marker may represent the 5' end of the full-length coding region of a cDNA and may 
include 5' untranslated sequences. Alternatively, a 5* most transcript marker may represent an 

20 internal coding region of a individual cDNA. Each 5' transcript marker can reflect the expression 
of an individual cDNA. 

As used herein, the term ^'adapter" refers to a synthetic fragment of nucleic acid which is 
ligated to a cDNA and which may contain recognition sites for restriction endonucleases. 

As used herein the term *'5' adapter" refers to a synthetic fragment of nucleic acid which 

25 is ligated onto the 5' end of cDNAs prior to ligation to a vector. In the present invention, the 5' 
adapter contains a first restriction endonuclease site which digests nucleic acid at a expected 
distance from its enzymatic recognition site and at least one other restriction endonuclease site. 
The second restriction endonuclease site is for a restriction endonuclease which digests the cDNA 
to form blunt ends or 5* or 3' overhangs. In a preferred embodiment of the present invention. 

30 digestion of the cDN A with the first and second restriction enzymes excises transcript markers 
which are at least 20 base pairs in length and contain at least 14 base pairs of cDNA sequence and 
6 base pairs of adapter sequence. 

-6- 
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As used herein the term "aiTay(s) of transcript markers'' or "ATM" refers to the collection 
or serial arrangement of transcript markers prepared by concatenation of individual transcript 
markers. As used herein, an ''ATM cDNA library'' is a cDNA library containing transcript 
markers as inserts. The inserts may be individual inserts, multiple inserts or a serial arrangement 
5 of multiple inserts which may be in sense or antisense orientation. 

As used herein the term ''full-length" coding region refers to the cDNA sequence for the 
entire transcribed mRNA for a particular protein from initiating methionine to the poly A tail. 

As used herein the term "type lis restriction endonucleasc" or ''type lis enzyme " refers to 
that categorv' of restriction endonucleases that digests nucleic acid at an expected distance from 
10 the enzymatic recognition site. Preferred examples of type lis enzymes for use in the present 
invention are those which digest nucleic acid at least 10 base pairs from the recognition site or 
which digest nucleic acid less than 10 base pairs from the recognition site but can be filled in 
cnzymatically to give a 10 base pair transcript marker. Examples of such Type lis enzymes 
include, but are not limited to Bpml, Bsgl Eco57I and BsmFI. Examples of type lis restriction 
15 endonucleases is illustrated in Figures 12A-12B. As will be understood by those of skill in the 
art, it is possible to alter enzymatic digestion conditions to change the nucleic acid digestion 
position of the type lis restriction endonuclease. 

As used herein the term **type Ilsg restriction endonuclease'' or "type Ilsg enzyme" refers 
to that categor\' of restriction endonucleases that digests nucleic acid within an expected range of 
20 its enzymatic recognition site. Examples of type Ilsg restriction enzymes include, but are not 
limited to Bcgl. 

As used herein the term "concatenating" refers to the process of ligating multiple 
transcript markers prior to ligation in a cloning vector. 

As used herein the term ''normalized cDNA library" or '"normalized library" refers to a 
25 cDNA librarv' constructed in such a manner as to reduce the redundancy in high- level abundance 
cDNAs. 

As used herein the term "subtractive hybridization" (when referring to construction of a 
cDNA library) refers to a process wherein a first population of nucleic acid is hybridized with a 
second labelled population of nucleic acid (driver) and the resultant nucleic acid hybrids removed 
30 to completion thereby identifying and isolating a set of nucleic acid sequences unique to the first 
population of nucleic acid which may be used in the construction of a cDNA librar\\ 

As used herein the temi "selected set" of random primers refers to a set of primers 

-7- 
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designed to anneal to a specific region of mRNA. In a preferred embodiment herein, the selected 
set of random primers is designed to anneal to the 5' most region of mllNA used as starting 
material for cDNA synthesis. 

As used herein the term ''transcript imaging" refers to a method of determining the 
5 relative abundance of individual transcript markers in a cDN A library. The term relative 
abundance refers to the number of times an individual transcript marker appears relative to the 
total number of transcript markers identified. 

As used herein the term "'non-amplified growth'" of host cells refers to growth conditions 
which allow for uniform growth of recombinant host cells. 
10 As used herein, the term *1iigh abundance" or "high-level abundance'' messages exist at 

greater than 10,000 copies per cell. 

As used herein, the term "mid abundance" or ''mid-level abundance"' species exist at 100 
to 400 copies per cell. 

As used herein, the term 'Mow abundance" or 'Mow-level abundance" species are found at 
15 less than 1 5 copies per cell. This low-abundance class represents 20 to 50 percent of the unique 
transcripts in the cell. 

As used herein the term ''wobble primer*' refers to a sequencing primer degenerate at the 
3'-end to allow sequencing for all three possible bases following the poly A tail. 

The present invention relates to rapid, high-throughput gene discovery methods that 
20 facilitate genome closure and to methods for analyzing gene expression patterns. The present 
invention is based, in part, upon the discovery of methods for the generation of transcript markers 
derived from the 5' end of cDNAs contained within a cDNA library. The present invention is 
also based, in part, upon the discovery of methods for the generation of two discontinuous 
transcript markers derived from both the 5' and 3' ends of cDN As contained within a cDN A 
25 library. 

The transcript markers of the present invention are derived from previously constructed 
cDNA libraries. The nucleotide information contained within transcript markers can be used to 
identify novel transcripts or to provide the basis for the design of PCR primers useful in 
extending and identifying a transcript marker nucleotide sequence contained within a cDNA 
30 library. CDNA libraries of the present invention can be constructed by methods which equalize 
the population of cDNAs or bias the population of cDNAs toward rare cDNAs, e.g. by 
normalization techniques, subtraction techniques, treatment with demethylating agents and 

-8- 
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treatment with differentiating agents, for example. 

The present invention also provides novel methods for achieving genome closure which 
combine rapid nucleic acid sequence identification of transcript markers from a cDNA with 
subsequent extension of the sequence of the identified transcript markers using PCR technology. 
5 Methods of the present invention may also be used to provide a transcript image of specific 
tissues or biological samples. 
11. Construction of cDNA libraries 

CDNA libraries for use in the methods of the present invention may be prepared by 
methods described in Maniatis et al. (1982, Molecular Cloning: a Laboratory Manual . Cold 
10 Spring Harbor Laboratory, Cold Spring Harbor, New York) or any means known to those of skill 
in the art. 

CDNA libraries may be constructed to bias the population of individual cDNAs toward 
desired coding regions. A cDNA library constructed with oligo d'f primers would be expected to 
contain nucleic acid sequences from the 3' coding region of a cDNA. as well as 3' untranslated 

1 5 regions. Additionally, a cDNA library constructed with oligo dT primers may also contain part 
or all of the entire coding region of a particular cDNA. A cDNA library constructed with random 
primers would be expected to contain nucleic acid sequences from all parts of the coding region 
of a cDNA, including the 5' most nucleic acid sequence of the full-length coding region. 

CDNA libraries constructed with selected sets of random primers, such as primers which 

20 specifically prime first strand cDNA synthesis toward the 5' end, such as with the incorporation 
of CapFinder™ PCR construction kit (Clontech Kl 05 1-1), are desirable for obtaining 5' 
transcript markers from the 5' most end of a putative transcript or cDN A. 

For gene discovery purposes, preferred methods for the construction of cDNA libraries 
include the use of random primers or a selected set of random primers designed to prime cDNA 

25 synthesis from the 5* end of the mRNA; the use of cell lines treated with demethylating 

compounds, such as, 5-aza-2'-deoxycytidine, as starting material for the preparation of mRNA; 
the use of cell lines treated with compounds which induce differentiation, such as retinoic acid; 
normalization or equalization methods; the use of subtractive hybridization techniques in the 
construction of cDNA libraries; and any method that induces the transcription of silent genes or 

30 allows for the identification of rare cDNAs, such as size fractionation of specific cDNA 
populations expected to contain rare cDNAs. 

Preferred methods for the construction of cDNA libraries intended for transcript imaging 

.9- 
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purposes are those which produce an unbiased population of cDNAs and would include a step for 
the non-amplified or uniform growth of host cells used in constructing the cDNA librarN'. Such 
growth conditions are described in Current Protocols in Molecular R^ nln|P; Y "Amplification of 
Cosmid and Plasmid Libraries" Unit 5.10. (1987). 
5 In the normalization process, the prevalence of high-abundance cDNA clones decreases 

dramatically, clones with mid-level abundance are relatively unaffected, and clones for rare 
transcripts are effectively increased in abundance. In the Scares et al. (1994, supra ) 
normalization procedure, the abundance levels of individual cDNA clones have been equalized 
by a kinetic re-annealing hybridization. This approach is designed to reduce the initial 10,000- 

10 fold variation in individual cDNA frequencies in order to achieve abundances within one order of 
magnitude while maintaining the overall sequence complexity of the librar>'. 

CDNA libraries prepared by norm^ilization techniques are not an accurate reflection of the 
source tissue's gene-expression profile: however cDNA libraries produced by normalization 
techniques may provide a source of low abundance transcripts and therefore, would be useful for 

1 5 gene discovery purposes. 

A variety of normalization techniques arc known by those of skill in the art for the 
production of normalized cDNA libraries. Weissmann S.M. (1987 Mol. Bio. Med, 4:133-143) 
describe a method based on hybridization to genomic DNA wherein the frequency of each 
hybridized cDNA in the resulting normalized library would be proportional to thai of each 

20 corresponding gene in the genomic cDNA. Ko. (1990, Nucleic Acid Res. 18:5705-5711) and 
Patanjali et al (1991, Proc. Natl. Acad. Sci. 88:1943-1947) describe a kinetic approach to 
constructing cDNA libraries. Soarcs (WO 95/08647, filed September 23. 1 994 and published 
March 30, 1995) describe a method to normalize a 3' directional cDNA library. 

Subtractivc hybridization of nucleic acids is a method to isolate the coding sequences of a 

25 gene which are differentially expressed such as during development or in disease states. CDNA 
libraries prepared by subtractive hybridization techniques are described in Rubenstein et al (1990, 
Nucleic Acids Research. 1 8:4833-4842); Travis ct al. ( 1 988, Proc. Natl. Acad. 5;ci USA, 
85:1696-1700). 

Obtaining cDNA libraries from cells treated with 5-aza-2'-deoxycytidine should enhance 
30 the discovery of rare genes and genes expressed in specialized cell types from which it is difficult 
to isolate and/or prepare DNA. A non-toxic concentration of 5-aza-2'-deoxycytidine can be pre- 
determined through titration/toxicity assays as illustrated in Figure 1 1 . 5-aza-2'-dcoxycytidinc 
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has been shown to induce the transcription of silent genes through dcmethyiaiion fHsieh et al 
supra). The preferred amount of 5-aza-2'-deoxycytidine to use is that amount that induces the 
transcription of silent genes without being toxic to the cell. This method can be coupled to 
subtractive methods to enhance further the discovery of novel transcripts or genes. 
5 IL Construction of ATM Libraries 

ATM libraries containing transcript markers can be constructed using any of the methods 
described in Figures 2-8. Figure 2 illustrates ATM Strategy 1 for constructing a cDNA library 
containing an array of transcript markers derived from the 5' most end of a cDNA (5' transcript 
markers). In Strategy L a cDNA library' is constructed using an adapter containing a Bpml site 

10 and a PvuII restriction site. The constructed cDN.A library is digested with Bpml and PvuH to 
isolate 20 bp 5* transcript markers (14 base pairs of cDNA and 6 base pairs from the adapter) 
from the cDNA library, the isolated markers arc treated with T4 DNA polymerase to yield blunt 
ends and the markers arc concatenated to form an array or serial arrangement of multiple 
markers. The concatenated array of 5' transcript markers is then used to create a librar\' 

15 containing an array of 5' transcript markers. In a variation of this method, individual transcript 
markers can be subjected to subtractive hybridization methods to bias the population toward rare 
transcript markers. 

Another strategy is shown in Figure 3 which illustrates ATM Strategy II for constructing 
a cDNA library containing an array of 5' transcript markers. In Strategy IL a cDNA library is 

20 constructed using an adapter containing a Bpml site and a PvuII restriction site. The constructed 
cDNA library is digested with Bpml and a second degenerate adapter containing a PvuII site is 
ligated onto the Bpml site. After PCR amplification of the template and PvuII digestion. 25 bp 
transcript markers are isolated. The isolated transcript markers are concatenated to form arrays, 
ligated into a vector and transformed into host cells to create a library containing an array of 5* 

25 transcript markers. 

ATM Strategy III, as shown in Figures 4A-4B illustrates the construction of a cDNA 
library containing an array of 5' transcript markers. In Strategy III, a cDNA library is constructed 
using an adapter containing a Bpml site and a PvuII restriction site. The constructed cDNA 
library is digested with Bpml and a second degenerate adapter containing a PvuII site is ligated 

30 onto the Bpml site. Sense RNA is transcribed from the template and the mixture is subjected to 
DNAsc treatment. First and second strand cDNA is synthesized from the template. The double 
stranded cDNA is digested with PvuII which yields 25 bp transcript markers. The transcript 
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markers are concatenated to form arrays, the arrays arc cloned into a vector and the vector 
transformed into host cells to create a library containing an array ot 5' transcript markers. 

ATM strategy IV, as shown in Figures 5A-5B, illustrates ATM Strategy IV for 
constructing a cDNA library containing an array of 5* transcript markers. In strategy IV, a cDNA 
5 library is constructed using an adapter containing a Bpml and PvuII restriction site. The 

constructed cDNA library is digested v^ith Bpml and PvuII to isolate 20 bp 5' transcript markers 
from the cDNA library, the isolated markers are treated with T4 DNA polymerase to yield blunt 
ends and the markers are concatenated to form an array or serial arrangement of multiple 
markers. Adapters are added on to the ends of the transcript marker arrays and the arrays are 
10 amplified using PGR technology . The arrays are cloned into a vector to create a librar>' 
containing arrays of the transcript markers (ATM). 

Another strategy is shown in Figures 6A-6Ei which illustrate ATM Strategy V for 
constructing a cDNA library containing an array of 5' transcript markers. In strategy V, a cDNA 
library is constructed using an adapter containing a Bpml and Pvull restriction site. The 
15 constructed cDNA library is digested with Bpml and Pvull to isolate 20 bp 5' transcript markers 
from the cDNA library, the isolated markers are treated with T4 DNA polymerase to yield blunt 
ends and the markers are concatenated to form an array or serial arrangement of multiple 
markers. At this point, the arrays are ligated into a plasmid vector, subjected to PGR to amplify 
the transcript markers, and cloned into a vector to create a cDN A library containing PGR 
20 amplified arrays of 5* transcript markers. 

Figure 7 illustrates a strategy for simultaneously obtaining two non-contiguous transcript 
markers from both the 5' and 3' end. In this strategy, first strand cDNA synthesis is performed 
using a modified random hcxamcr. The hexamer is designed to provide directionality. Second 
strand cDNA synthesis is performed by standard means and an adapter containing a Bpml and 
25 PvuII site is ligated onto both ends of the cDNA. The cDNA is ligated to a vector modified to 
delete all Bpml restriction sites. The vector containing the cDNA is transformed into a host cell 
to create a cDNA library and plasmid DNA is isolated and treated with Bpml, thereby creating 
individual linearized cDNAs containing both a 5' and 3' transcript marker. The linearized cDNA 
is treated with T4 polymerase to blunt end, self-ligated and re-transformcd to create a libran'. 
30 The plasmid DNA of the library is isolated and digested with PvuII to excise the nucleic acid 
containing non-contiguous transcript markers from both the 5' and 3' end. The excised transcript 
markers are concatenated and cloned to create a library containing a serial arrangement of non- 
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contiguous transcript markers from the 5' and 3' end which can be subjected to high-throughput 
nucleic acid sequence analysis. In another variation of this method, the vector is constructed lo 
contain 5' and 3* Bpml sites at a cloning site thereby providing a means to excise the transcript 
marker from the cDNA. 
5 Figure 8 illustrates a strategy for obtaining non-contiguous transcript markers from 

random areas of a cDNA. In this strategy, mRNA is converted to cDNA following standard 
procedures. The cDNA is ligated to a vector that has been constructed to have restriction 
endonuclease sites for restriction endonucleases having 4 base pair recognition sites removed. 
The cDNA library is amplified and the plasmids prepared. The cDNA library is digested with a 

10 restriction endonuclease having a 4 base pair recognition site and the digested plasmid is purified. 
A 12 base pair adapter containing a Bcgl site, which digests nucleic acid within a range of its 
recognition site, is ligated onto the linearized cDNA. and the cDNA is transformed into a host 
cell. The cDNA library is digested with Bcgl resulting in the release of a 36 base pair fragment 
from each cDNA which originally contained the 4 base pair restriction endonuclease site. The 

15 fragments may be ligated into a vector directly or concatenated and ligated into a vector and 
subjected to sequence specific analysis. 

The ATM libraries of the present invention may contain a single transcript marker per 
individual clone or multiple transcript markers constructed in a serial arrangement. As illustrated 
in the figures, transcript markers may be concatenated. PCR amplified and then ligated into a 

20 vector or concatenated, ligated into a vector and then PCR amplified. 

In a preferred embodiment of the present invention, transcript markers that are excised 
from the cDNA library contain a cDNA portion and a synthetic adapter portion. In a preferred 
embodiment disclosed herein and as shown in Figure 2, the cDNA portion is at least 14 base pairs 
in length and the adapter portion is 6 base pairs which are designed to be asymmetric. The 

25 adapter portion provides the means for determining the sense orientation of the transcript marker 
in the vector as well as a means for determining the beginning of each transcript marker excised. 
Nucleic acid sequencing of the excised transcript markers can be performed to create a nucleic 
acid data set. For nucleic acid sequence analysis purposes, the nucleic acid adapter portion of the 
transcript marker can be subtracted or removed from the transcript marker nucleic acid data set. 

30 The ATM markers may exist in.sense or antisense orientation in the vector. The presence 

of an nucleic acid fragment which provides directionality, such as an asymmetric adapter portion, 
provides the means for discerning the sense from the anti-sense stand and provides the means for 
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determining the beginning of each transcript marker. Figure 14A illustrates sequence data from a 
single clone containing an array of 6 transcript markers. The 6 bp ' spacer'' DN A sequences that 
distinguish one transcript marker from another are underlined. The spacer sequence '^Cl CjGAG*' 
indicates that the immediately adjacent 14 bp to the right is a transcript marker (sense strand). 
5 The spacer sequence ''CTCCAC indicates that adjacent 14 bp to the left is a transcript marker 
(antisense strand). Figure 14B provides a list of the six 14 bp transcript markers (sense strand) 
without the spacer DNA sequence derived from the array in (14a). Asterisks (*) indicate 
sequences which are the reverse complement of the sequence actually found in the array. 
lY. Method for Nucleic Acid Sequencing 

1 0 Nucleic acid sequencing of transcript markers can be performed by any means known to 

those of skill in the art. Methods for cDNA sequencing employ such enzymes as the Klenow 
fragment of cDNA polymerase I Sequenase® (US Biochemical Corp, Cleveland OH)), Taq 
polymerase (Perkin Elmer, Norwalk CT), thermostable T7 polymerase (Amersham, Chicago IL), 
or combinations of recombinant polymerases and proofreading exonucleases such as the 

15 ELONGASF Amplification System marketed by Gibco BRL (Gaithersburg MD). Preferably, the 
process is automated with machines such as the Hamilton Micro Lab 2200 (Hamilton. Reno NV), 
Peltier Thermal Cycler (PTC200; MJ Research, Watertown MA ) and the ABI 377 cDNA 
sequencers (Perkin Elmer). 
Y, PCR Methods 

20 Numerous PCR methods are known to those of skill in the art that would facilitate 

isolation, amplification and/or extension of nucleic acid sequences in the 5' or 3' direction. 
Gobinda et al (1993; PCR Methods Applic 2:31 8-22) disclose "restriction-site" polymerase chain 
reaction (PCR) as a direct method which uses universal primers to retrieve unknown sequence 
adjacent to a known locus. First, cDNA is amplified in the presence of primer to a linker 

25 sequence and a primer specific to the known region. The amplified sequences are subjected to a 
second round of PCR with the same linker primer and another specific primer internal to the first 
one. Products of each round of PCR are transcribed with an appropriate RNA polymerase and 
sequenced using reverse transcriptase. 

Inverse PCR can be used to amplify or extend sequences using divergent primers based 

30 on a known region (Trigiia T et al {1988)'Nucleic Acids Res 16:8186). Adapters are ligated onto 
cDNAs which then allow cDNAs to be circularized. The intramolecular ligation products then 
serve as PCR templates. The method uses several restriction enzymes to generate a suitable 
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fragment in the known region of a gene. The fragment is then circularized by intramolecular 
ligation and used as a PCR template. 

Capture PCR (Lagerstrom M et al ( 1 99 1 ) PCR Methods Applic 1 : 1 1 1 - 1 9) is another 
method which may be used. The method involves PCR amplification of cDNA fragments 
5 adjacent to a known sequence in human and yeast artificial chromosome cDNA. Capture PCR 
also requires multiple restriction enzyme digestions and ligations to place an engineered 
double-stranded sequence into an unknown portion of the cDNA molecule before PCR. Another 
PCR method which may be used to retrieve sequences is that of Parker JD et al (1991 : Nucleic 
Acids Res 19:3055-60). 
10 Capillary electrophoresis may be used to analyze the size or confirm the nucleotide 

sequence of sequencing or PCR products. Systems for rapid sequencing are available from 
Perkin Elmer. Beckman Instruments (Fullerton CA), and other companies. Capillary sequencing 
may employ flowable polymers for elcctrophoretic separation, four different fluorescent dyes 
(one for each nucleotide) which are laser activated, and detection of the emitted wavelengths by a 
15 charge coupled devise camera. Output/light intensity is converted to electrical signal using 
appropriate software (eg. Genotyper^^' and Sequence Navigator*^'^ from Perkin Elmer ) and the 
entire process from loading of samples to computer analysis and electronic data display is 
computer controlled. Capillary electrophoresis is particularly suited to the sequencing of small 
pieces of cDNA which might be present in limited amounts in a particular sample. The 
20 reproducible sequencing of up to 350 bp of Ml 3 phage cDNA in 30 min has been reported 
(Ruiz-Martinez MC et al (1993) Anal Chcm 65:2851-2858). 
VI» Method for determining Genome Closure 

The present invention provides novel methods for achieving genome closure by 
combining rapid nucleic acid sequence identification of transcript markers from a cDNA with 
25 subsequent extension of the sequence of the identified transcript markers using PCR technology. 
Transcript markers constructed by the method illustrated in Figure 8 which have been 
concatenated and iigated to a vector provide for the sequence specific identification of 1 5-20 
transcript markers per vector. In one embodiment described herein, PCR technology is used to 
extend the sequence of a transcript marker in an outward direction as described in US Patent 
30 Application 08/487,1 12, filed June 7, 1995, specifically incorporated by reference. One primer is 
synthesized to initiate extension in the antisense direction (XLR) and the other is synthesized to 
extend sequence in the sense direction (XLF). Primers allow the extension of the known 
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sequence ''outward'' generating amplified nucleotide sequences containing new. unknown 
nucleotide sequence for the region of interest. As shown below, PCR primers which contain the 
sequence of a known adapter joining the non-contiguous transcript markers and all possible 
combinations of nucleotides for the 5 nucleotide positions flanking the adapter are used in PCR 
5 reactions to extend the sequence of the identified transcript markers using PCR technology. 

N N N N N 16 base pair adapter N N N N N 

3' 

5' 

Considering that for each N position, there could be 4 possible nucleotides, a total of 1024 

10 individual primers would be required for each 5' and 3' extension for a total of 1 ,048.576 

combinations which will be specific enough to amplify any DNA. 

CDNA libraries which have been prepared with oligo dT primers will allow for the 

identification of the 3' most part of the cDNA and part of the coding region or the complete 5' 

most end of a cDNA. CDNA libraries constructed with random primers constructed in 
15 combination with techniques that result in a high representation of the 5' ends of cDNA. such as 

the Cap-Finder^^ PCR construction kit (Clontech). will allow for the amplification of the 5' ends 

of genes. 

In another method illustrated in Figure 9, genome closure can be achieved by the 
identification of all transcribed genes from an mRNA source. As illustrated in Figure 9. step 1, a 

20 cDNA library containing a biotinylated poly A tail and having a Not I restriction site is 

constructed by standard means and cloned into a vector from which restriction endonuclease sites 
for 4 base pair restriction endonucleases have been removed. Multiple cDNA libraries derived 
from a variety of mRNA sources can be pooled to provide one sample. In Figure 9. step 2, the 
cDNA is digested with a restriction endonuclease which has a 4 base pair recognition site and the 

25 digested cDNA is captured by sireptavidin-beads (Figure 9 step 3). The cDN A is digested with 
Not I which removes the biotin and ligated into a vector having a 1 ype lis restriction 
endonuclease site for MboII (Figure 9, step 4). Other Type lis restriction endonucleases, such as 
Bpml, Bsgl Fxo57I and BsmFI, can be used. The cDNA is digested with MboII which cuts 8 
base pairs into the cDNA. A known linker of sufficient size for PCR amplification is cloned into 

30 the Mbo II restriction site and the cDNA pools are subjected to PCR analysis using the primers 
described above. Generation of the PCR-extension products can either be sequenced directly or 
with a wobble primer, ie, a sequencing primer degenerate at the 3'-end to allow for all 3 possible 
bases following the poly A tail, or religated and cloned into a vector and then subjected to nucleic 
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acid sequencing. 

VIl. Transcript Imaging 

Transcript imaging is a method for evaluating changes in gene expression caused by 
factors such as disease progression, pharmacologic treatment and aging. Transcript imaging is 
5 accomplished by sequencing several thousand clones from a particular tissue or cell type and 
electronically recording the abundance levels for each mRNA species identified. Electronic 
manipulations can then be done to examine which mRNAs are up- or down-regulatcd. or 
unchanged. 

Transcript imaging can be achieved by using transcript markers produced by the present 
10 invention. After nucleic acid sequencing of the transcript markers is accomplished, the 

abundance levels for each transcript marker arc electronically recorded. Electronic manipulations 
can then be performed to examine which transcript markers are up- or down- regulated, or 
unchanged. 

INDUSTRIAL APPLICABILITY 
15 L Construction of cDNA Libraries 

The following example describes the construction of a cDNA library. The first step is to 
isolate mRNA from a desired biological tissue or cell source. The mRNA 
is then used in the synthesis of cDNA, 
RNA Lsolatign 

20 RNA is isolated using guanidinium isothiocyanate and 2-mercaptoethanol lysis, followed 

by ullracentrifugation over a cesium chloride gradient to obtain total RNA (Chirgwin ct al). 
Alternatively, total RNA can be isolated using acid/phenol extraction (Chowzisky et al) and 
polyadenylaied RNA can be isolated directly ushig a biotinylated oligo dT primer. An optical 
density measurement is taken to assess the quantity of total RNA isolated, and an aliquot is run 

25 on an electrophoresis gel to assess the quality and integrity of the RNA. The RNA is then stored 
until needed at -80 ''C, which prevents degradation. 

In order to obtain cleaner total RNA, each sample is treated with DNAse and acid phenol, 
followed by precipitation and washing. The RNA is again run on an electrophoresis gel to make 
sure it is free of genomic DNA contamination. Subsequent selection of polyadenylated (poly A) 

30 RNA is done with cither an oligo(dT)-based affinity column or Oligotex^^^ latex microspheres. 
The quality of the isolated mRNA is checked confirmed, and the sample is used in cDNA library- 
construction. 
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Synthesis of first-strand cDNA is initiated using a poly(dT) primer that is complementary 
to the polyA stretch at the 3' end of most transcripts; and the reverse transcriptase enzyme. The 
primer used in this reaction contains a restriction enzyme recognition site (Noli) that permits 
5 directional insertion into an appropriate cloning vector. Second-strand cDNA synthesis is based 
on the method developed by Gubler and Hoffman (1983). RNAse H nicks the RNA/cDNA 
hybrid created in the reverse transcription reaction, creating priming sites for E. coli DNA 
polymerase to synthesize second-strand cDNA. The gaps in the second strand are ligated 
together using H. coli DNA ligase. 

10 After the ends of the cDNA are blunted with T4 or Pfu DNA polymerase, an adaptor is 

ligated to the double-stranded cDNA. This oligonucleotide, which contains an EcoRI-compatiblc 
sticky end, allows for directional cloning on the cDN A once digestion is complete with Notl the 
restriction enzyme site found at the 3' terminus of the cDN A. The cDNA is then sizc-fraciionated 
to remove vcr>' short cDNAs. which would inhibit the generation of highly complex libraries. At 

15 this point, the cDNA is ligated into a plasmid vector system and transformed into bacterial cells 
for propagation. 

11- Construction of ATM cDNA libraries 

Three cDNA libraries were constructed containing an adapter having the Type lis 
restriction endonuclcase, Bpm I, at the 5' end of each cDNA insert. 

20 Two ug each of poly A* RNA from human colon, human prostate, and a single species 

control RNA (bacterial chloramphenicol transferase) were reverse transcribed using an oligo-dT 
primer and Superscript reverse transcriptase according to manufacturer's instructions (Gibco 
Superscript Plasmid System). Following second strand synthesis, 3 ug of the adapter containing 
the Bpm 1 site were ligated to each cDNA. 

25 The adapter containing the Bpm I site was prepared previously in the following manner: 

two oligonucleotides (5' AATTCAGCTGGAG and 5' phos-CTCCAGCTG) were synthesized 
and purified by HPLC and polyacrylamidc gel electrophoresis (New England Bioiabs). 
Equimolar amounts of the two oligos were combined in annealing buffer (20 mM Tris, pH 7.4, 2 
mM MgCU, 50 mM NaCl), boiled, and allowed to cool to <30 degrees C in a heating block. 

30 Following adapter ligation, the cDNA was digested with Not I, fractionated over a Sepharose CL- 
4B column and cloned into a pSPORT vector (LTI, Inc.). 

To create an AI M library we followed the procedure as outlined in ATM Strategy I, as 
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illustrated in Figure 2, starting with prostate mRNA containing the Bpm I adapter described 
above. This library was transformed into E. coli strain DHIOB by clectroporation and allowed to 
grow either on LB plates or LB broth each supplemented with carbenicillin. Transformed cells 
were collected either by scraping (plates) or by centrifugation (media) and plasmid DNA was 
5 isolated (Promega or Qiagen systems). Greater than 100 ug of plasmid DNA was then digested 
with Bpm I and Pvu IL Multiple aliquots of 20 ug of plasmid were digested for 1.5 hours using 
14 U of Bpm 1 (New England Biolabs) in a 150 ul volume at 37 degrees C. Thirty units of Pvu II 
(New England Biolabs) were then added and digestion proceeded for an additional hour at 37 
degrees C. The digested DNA was fractionated on 20% TBE acrylamide. A 20 bp band was 

10 excised, the DNA was recovered by electroelution or by incubating the crushed gel slices in 
tris/EDTA overnight. The transcript markers were treated with T4 DNA polymerase and cloned 
either directly into pSPORT or concatenated. 
2, Analysis of ATM libraries 

ATM libraries were made according to ATM Strategy 1, illustrated in Figure 2. Vector 

15 background of these libraries was determined by screening randomly selected clones from the 
respective libraries by restriction enzyme digestion of plasmids with Bpm I and PvuIL thereby 
releasing transcript marker inserts (n= number of clones screened). Library size is determined by 
subtracting the background from the number of individual colonies or transformants that are 
generated from a single ligation reaction of the transcript markers to pSport vector. Average 

20 number of markers per clone as shown in Table I is determined by both insert size and DNA 
sequence verification. 

TABLE I 

library vector background library size avg.#markers/clone 

Prostate 1 10%(n=10) 1.1x10' 3.8 

25 Prostate 2 0% (n=12) 0.95 x 10' 4.9 

Prostate3 0%(n=12) 0.12x10' 3.8 

Arrays of transcript markers from the prostate ATM library were PCR amplified for 30 
cycles using M 1 3 forward and reverse primers. These arrays were size selected on a 6% 
acrylamide gel and cloned into a pSPORT-derived vector. Twenty-seven random clones were 
30 sequenced from a total of 13 1 total transcript markers analyzed. There was an average of 4.9 
markers per clone with a range of 4-7 markers/clone. Table II illustrates the average marker size 
per number of clones. 
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TABLE II 



marker size 


number (%) 


12 


2(1.5) 


13 


6(4.6) 


14 


102 (77.9) 


1 c 

1 ^ 


14 ( lu. / ) 


16 


1 (0.8) 


14/15 


6 (4.6) 



10 The marker size 'M4/I5" referii lo an ambiguous situation where there arc 29 bp (instead of 28 bp) 
with markers which are concatenated back to back as shown: 

spacer 1 29 bp— two markers back to back | ^spacer 

Cnr,GAG NNMNNNNNNNMsTs^NNNN^ 
Because the frequency of 13 and 16 bp markers is much lower than 14 and 15 bp markers. 
15 the 29 bp between the spacer sequences arc assumed to be composed of one 14 bp and one I 5 bp 
marker, rather than one 1 3 bp and one 16 bp marker. Thus, the true total number of 14 bp 
markers was 105 (102+3; 80.2%) and the true total number of 15 bp markers was 17 (14-3: 
1 3%). 

HI. Isolation of a full length cDNA 

20 A transcript marker (5' ccga tf a g tc ff tc gg ) was identified from a prostate Al M library 

which corresponds to the gene apoferritin H (GenBank Gl numbers: g31340, g3I342, g28434). 
Based on the published sequence, this marker is present at the 5' end of the mRNA. 14 
nucleotides downstream from the start of transcription and 181 nucleotides upstream of the start 
of the coding sequence. The entire mRNA is predicted to be about 0.9 kb. 

25 As illustrated bclow^ to isolate the apoferritin 1 1 cDNA from the ATM Ubrarv\ a gene 

specific primer ("g3 1 340'') that contains 7 nucleotides of the Bpm I adapter and the 14 
nucleotides of the transcript marker (5' gctggag ccgag a gtcgtcgg > w^as designed. The cDNA 
inserts from 1 ug of the prostate ATM library were amplified by 30 cycles of PGR using the Ml 3 
forward and reverse primers. This PGR reaction was diluted 1 :50 in water and 1 ul was 
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reamplified for 33 cycles using the nested T7 and "g3 1340'' primers. A 0.9 kb band was isolated, 
gel purified and cloned, DNA sequencing confirmed the identity of the cloned gene as full length 
apoferritin H, Alternatively, the 0.9 kb PCR product is sequenced directly with appropriate 
primers. 

5 M 1 3 forward ■ M 1 3 reverse 

"g31340" ^ T7 



vector 



10 IV. Hi gh Throughput Isolation of cDNA clones 

High throughput isolation of cDNA clones from an AT M library is achieved in the 
following manner. First, a master pool of insert cDNA is created from an ATM library by PCR 
amplification using primers found in the vector (e.g., M13 forward and reverse). Second, gene 
specific primers arc synthesized in 96 well arrays (Gibco). Third, aliquots of gene specific 

15 primers, PCR reagents, and the master cDNA pool are aliquotcd to 96 PCR wells for PCR and 
subsequently analyzed by gel electrophoresis, in addition, an initial screen for successful PCR 
reactions is accomplished by doing real time flourescent detection of PCR products. With this 
technique only those reactions which give a significant fluorescent signal above background 
would then be analyzed by gel electrophoresis, 

20 V. Extension of Transcript Markers to Full Length 

The nucleic acid sequence of transcript markers can be used to design oligonucleotide 
primers for extending a partial nucleotide sequence to full length or for obtaining 5' sequences 
from genomic libraries. One primer is synthesized to initiate extension in the antisense direction 
(XLR) and the other is synthesized to extend sequence in the sense direction (XLF). Primers 

25 allow the extension of the known sequence "outward'' generating amplified nucleotide sequences 
containing new, unknown nucleotide sequence for the region of interest (US Patent Application 
08/487,1 12, filed June 7, 1995, specifically incorporated by reference). The initial primers are 
designed from the cDNA using OLIO'*^ 4.06 Primer .Analysis Software (National Biosciences), or 
another appropriate program, to be 22-30 nucleotides in length, to have a GC content of 50% or 

30 more, and to anneal to the target sequence at temperatures about 68°-72° C. Any stretch of 
nucleotides which would result in hairpin structures and primer-primer dimerizations is avoided. 
The original, selected cDNA libraries, or a human genomic librar\', is used to extend the 
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sequence. A genomic library is most useful to obtain 5' upstream regions. If more extension is 

necessar>' or desired, additional sets of primers are designed to further extend the known region. 

By following the instructions for the XL-PCR kit (Pcrkin Elmer) and thoroughly mixing 

the enzyme and reaction mix, high fidelity amplification is obtained. Beginning with 40 pmol of 

5 each primer and the recommended concentrations of all other components of the kit, PGR is 

performed using the Peltier Thermal Cycler (PTC200; MJ Re.search, Watertown MA) and the 

following parameters: 

Step 1 94"" C for 1 min (initial denaturation) 

Step 2 65'' C for 1 min 

10 Step 3 68^ C for 6 min 

Step 4 94^ C for 15 sec 

Step 5 65 " C for 1 min 

Step 6 68° C for 7 min 

Step 7 Repeat steps 4-6 for 1 5 additional cycles 

15 Steps 94^^ C for 15 sec 

Step 9 65" C for 1 min 

Step 10 68° C for 7:15 min 

Step 1 1 Repeat step 8-10 for 12 cycles 

Step 12 72 C for 8 min 

20 Step 13 4 " C (and holding) 

A 5-10 ^\ aliquot of the reaction mixture is analyzed by electrophoresis on a low 
concentration (about 0.6-0.8%) agarose mini-gel to determine which reactions were successful in 
extending the sequence. Bands thought to contain the largest products were selected and cut out 
25 of the gel. Further purification involves using a commercial gel extraction method such as 
QIAQuick^''^ gel extraction (QIAGEN Inc). After recovery of the DNA, Klenow enzyme was 
used to trim single-stranded, nucleotide overhangs creating blunt ends which facilitate reiigation 
and cloning. 

After ethanol precipitation, the products are redissolved in 13 ^dl of ligation buffer, lu\ 
30 T4-DNA ligase (15 units) and l^^l T4 polynucleotide kinase are added, and the mixture is 

incubated at room temperature for 2-3 hours or overnight at 16"* C. Competent £^c£»li cells (in 
40 /^l of appropriate media) arc transformed with 3 ui of ligation mixture and cultured in 80 of 
SOC medium (Sambrook J et aL supra). After incubation for one hour at 37° C, the whole 
transformation mixture is plated on Luria Bertani (LB)-agar (Sambrook J et al, supra) containing 
35 2xCarb, The following day, several colonies arc randomly picked from each plate and cultured in 
150 m1 of liquid LB/2xCarb medium placed in an individual well of an appropriate, 
commercially-available, sterile 96-well microtiter plate. The following day, 5 ^1 of each 

.70. 
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overnight culture is transferred into a non-sterile 96-well plate and after dilution 1:10 with water. 
5 fj.\ of each sample is transferred into a PCR array. 

For PCR amplification. 1 8 /^l of concentrated PCR reaction mix f3.3x) containing 4 units 
of rTth DNA polymerase, a vector primer and one or both of the gene specific primers used for 
5 the extension reaction are added to each well. Amplification is performed using the following 
conditions: 



Step 1 94° C for 60 sec 

Step 2 94" C for 20 sec 

Step 3 55^ C for 30 sec 

10 Step 4 72° C for 90 sec 

Step 5 Repeat steps 2-4 for an additional 29 cycles 

Step 6 72" C for 180 sec 

Step 7 4" C (and holding) 



Aliquots of the PCR reactions are run on agarose gels together with molecular weight 
1 5 markers. The sizes of the PCR products are compared to the original partial cDNAs, and 
appropriate clones are selected, ligated into plasmid and sequenced. 
VL 5 - aza-2' Deoxycytidine Treatment of Cells 

5-aza-2'-deoxycytidine induces transcription of silent genes, presumably by 
demethylating cytosines in CpG islands which are regulatory regions located upstream of most 
20 genes, btaining libraries from cells treated with 5-a/.a-2'-dcoxycytidine will enhance the 
discovery of rare genes and genes expressed in specialized cell types difficult to i.solate and 
prepare RNA from. 
Methods 

THPl cells at a density of 1.1 million cells per ml were treated for three days with 0.8 
25 micromolar 5-aza-2*-deoxycytidine. The medium used for growth conditions was Iscove's 
modified DMEM with 10% Fetal Bovine Serum. 

HNT precursor cells at 80% confluency were treated for three days with 0.35 micromolar 
5-aza-2'-deoxycytidine. The medium used for growth conditions was Iscove's modified DMFM 
Because 5-aza-2'-deoxycytidine has been shown to be toxic in cultured cells and animal. 
30 initial experiments were conducted to a.ssess the toxicity of 5-aza-2'-deoxycytidine on hNT and 
THPl cells to establish conditions where the ceils would survive and RNA could be recovered. 
Around 1 micro molar 5-aza-2'-deoxycytidine is a concentration typically used to induce silent 
gene transcription. A concentration of 0.8 micro molar was selected for THP 1 cells and 0.35 
micro molar for hN'f cells. 

-23- 
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Results 

Northern analysis to measure the RNA levels of iwo identified genes was performed on 
the cells to verify that 5-aza-2*-deoxycytidine was inducing gene transcription under the 
conditions used. Significant induction of both identified genes by 5-aza-2'-deoxycytidine was 
5 obtained in the hNT cells. Many genes were found to be induced by 5-aza-2''deoxycytidine in 
both THP 1 cells and hNT cells. 

The following 3 genes were in the cDNA library constructed from mRNA from THP cells 
treated with 5-aza-2'deoxycytidine library and not in the control cDNA library: enBank g29382, 
human BBCl mRNA; GenBank g337507, human ribosomal protein S25 mRNA; and gl 84553 
10 human insulinoma pig-analog mRNA. 

The following genes were in the cDNA library constructed from mRNA from TUP cells 
treated with 5-aza-2'deoxycytidine library and in the control cDNA library and found 
upregulated in THPl : GenBank gl82055 human neutrophil clastasc mRNA, 3' end; g36143 
human mRNA for ribosomal protein SI 1 ; g793842 human mRNA for ribosomal protein L29: 
15 g28976 human mRNA for azurocidin: and g34891 1 human glycoprotein mRNA. 

The following 3 genes were in the cDNA library constructed from mRNA from hNT cells 
treated with 5-aza-2'deoxycytidine library and not in the control cDNA library: g 190233 human 
acidic ribosomal phosphoprotein PI ; g43621 7 human mRNA (K1AA0037) for ORF; and 
g385936 hinge-OXPHOS system complex III. 
20 An additional result of 5-aza-2'-deoxycytidine treatment was a decrease in the expression 

of many genes, particularly more abundant mRNAs. 
VIL Construction of the Plasmid dIIEZ^I 

Plasmid pIIEZ-1 was derived from pUC19 vector by removing the 991 base pairs from 
the restriction endonuclease sites Sspl through An3. A 90 base pair synthetic poly linker 
25 containing type lis restriction endonuclease sites were ligated to the remaining pUC19 vector, as 
shown in Figure 13. 

The type lis poly linker was made by armealing the two synthetic oligomers P-lI-l and P- 
II-2. The oligomer sequences are: 
P-IM 
30 5' 

CATGTGCGGCCGCGGCGCGCCGTCAGCTGCACAGATGCGAGCTCCAGGCATTCATCC 
TTAAGTACTTCAGTAGACGTCCCTGCAGGTGAAT'rC 

.24- 
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P-II-2 
5' 

GAATTCACCTGCAGGGACGTCTACTGAAGTACTIAAGGATGAATGCCTGCAGCTCGC 
ATCTGTGCAGCTGACGGCGCGCCGCGGCCGCA 
5 After annealing, the double stranded linker has a 5' blunt end which is ligaled into the 

Sspl restriction endonuclease site and a 4 bp (5' CATG) extension which is complimentary to the 
Afl3 restriction endonuclease site. 

As illustrated in Figure 13. the polylinker contains 4 type lis restriction endonuclease 
sites which can be used for generation of 14 bp transcript markers. 
10 Standard PGR primer directed site specific mutagenesis is carried out to eliminate all type 

lis restriction endonuclease sites and all 4 base pair restriction endonuclease sites. 
VIII. Genome Closure 

A method directed toward genome closure involves the systematic identification of all 
transcribed genes. The method involves the generation of cDNA\s with a defined 5*-end. 
1 5 generation of defined priming sites for extension PGR amplification from within the cloned 
cDNAs, systematic amplification of all cDNAs, and sequencing of all extension products. 
Generation of cDNA\s with a defined 5'-end 

As illustrated below, after conversion of mRNA into cDNA with standard methods, the 
cDNA's are digested with a 4 base pair cutter. The cDNAs are ligated into the vector displaying 
20 a type lis restriction endonuclease site at one end. 

1) Generation of cDNAs 

*Biotin 

cDNA j^^^j 

AAAAAAAAAAAAAA 

25 

2) Digestion with Hpa II 

*Biotin 

V— cDNA V rm^TTTTTTTrrT NotI 

30 V V AAAAAAAAAAAAAA 

3) Capture of cDNA's with Strcptavidin-beads 

-25- 
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♦Biolin 

CGG TTTTTTTTTTTTTT NotI Y 

C AAA AAA AAA AA AAA bead 

5 

4) Ligation of adapter/linker 
adapter/linker 

*Biotin 

1 0 TCGACTAGTACG AAG A CGG —TTTTTTTTTTTTTT Noll Y 

GATCATGCTTCTGC C AAAAAAAAAAAAAA bead 



TCOAC l AGTACGAAGACGG TTTTTTTTTTTTTT NotI Y 

1 5 G ATCATGC I rC i'GCC AAAAAAAAAAAAAA bead 

5) Not I digest, ligation into pUC 18 vector (Not I/Sal I) 
— . — V TTTTT NotI 



20 vector— 



6) Digestion with Bbs I, cuts 6bp into cDNA 
Bbs I (digest with Bbs I) 

25 GAAGACGC NNNNNNNNN 



CTTCTGCGNNNN N N N NN 



30 ! 

Sail vector Noll 
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7) Bbs I (fill in with Klenow) 



GAAGACGCNNNN 



NNNNNNNNN 



llllllllilll 



lillllli 



5 



CTTCTGCGNNNTN 



NlvnvfNNNNNN 



Sail 



vector 



•Not! 



10 



Generation of defined priming sites for extension PCR amplification from within the 
cloned cDNAs. 

The cDNAs are cloned into a vector, amplified and digested with the type lis restriction 
15 endonuclcasc. Linkers arc cloned into the linearized plasmid/cDNA fragment. 



Illlllllli' 

GTAC lAG lACCGG 

20 9)Ligate on linkers 

GAAGACGCNNNNCA'l GATCATGG GGCCATGATCATGNNNNNNNNN 

illllllllllilllllllllll lllllllllllllillllil 

CTTCTGCGNNNNGTACTAGTACCGG GGrACTAGTACNNNNNTNNNN 



Sal! vector- 



so 10)Treat with T4-DNA polymerase in presence of (dA, dT) 

a) GAAGACGCNNNNCATGATCAT GGCCATGATCATGNNNNrNNNNN 

IIIIIIIIIIIIIIIIHIII llllllllllllllilll 



8)Bbs I 



CATGATCATGG 
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C'lTCTGCGNNNNGTACTAGTACCGG 



TACl AGTACNNNNNNNNN 



5 



SalN 



•vectof' 



NotI 



Systematic amplification of all cDNAs 

Extension PGR as described in Example V with combinations of defined primer sets (up to 
1 0-fold degeneracy can be achieved, depending on the type lis enzyme selected in the step 4 
10 adapter). Therefore selective amplification of every cDNA present in the library can be achienved 
in a systematic manner. 



ATCATGGCCATGATCATGN4-> 
GAAGACGCNNNNCATGATCATGGCCATGATCAIGNNNNNNNNNNNNNNNNN 

15 lllllllllllllllllllllllllilllllllilllllilllllllllll 

CTTCTGCGNKKNGTACTAGTACCGGTACTAGTACNNNNNNNNNNNNNNNW 
I <-4NGTACTACiTACCGGTAC fA ' ; 



Since NNNN and NNNN are symmetrical combinations for primer pairs are given (e.g. 
5' NNNN 3' = 5- AGCA 3' then 

25 5'ATCATGGCCATGATCATGAGCA 3' will fomi a pair with 
5'ATCATGGCCATGATCATGTCGT 3" 

Any PCR-reaction will cover 2 primer combinations. 

30 combination 1 : 

5'ATCATGGCCATGATCATGAGCA 3' 
5'ATCATGGCCA TGATCATGTCGT 3' 



20 



Sail 



vector 



NotI 
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and 

combination 2: 

S'ATCATGGCCA TGATCATGAGCA 3' 
5'ATCATGGCCATGATCATGTCGT 3' 

5 

F:^a rnple of selective amplification with a 4 fold degenerate p rimer set resulting in 128 
pools of cDNAs 

To cover all possible combinations of 4 bases, 256 oligonucleotides will be 
synthesized (5'ATCATGGCCATGATCATGNNNN 3'). Since every PCR-reaction covers 

10 2 combinations, 128 reactions will cover all possible combinations. A range of 200-300 
PCR-products/rcaction will be expected for a specific tissue. Since this is an inverse PGR 
approach, the products are sequenced directly with the wobble primer, increasing the 
resolution by a factor 3. The PCR-products then are directly thermo-cycled with the 
wobble primers resulting in a higher resolution. One colour is chooscn for each nucleotide 

1 5 with no terminator included in the reaction. The resulting products are run on a sequencing 
gel. 

All publications and patents mentioned m the above specification are herein 
incorporated by reference. Various modifications of the described method and system of 
the invention will be apparent to those skilled in the art without departing from the scope 

20 and spirit of the invention. Although the invention has been described in connection with 
specific preferred embodiments, it should be understood that the invention as claimed 
should not be unduly limited to such specific embodiments. Indeed, various modifications 
of the above-described modes for carrying out the invention which are obvious to those 
skilled in the field of molecular biology or related fields are intended to be within the scope 

25 of the following claims. 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION 

(i) APPLICANT: INCYTE PHARMACEUTICALS, INC. 

(ii) TITLE OF THE INVENTION: METHODS FOR GENER/iTING AND ANALYZING 
TRANSCRIPT MARKERS 

(ixi) NUMBER OF SEQUENCES: 50 

{iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Incyce Pharmaceutical?, Tnc, 

(B) STREET: 317^ Porter Drive 

(C) CITY: Palo Alto 

(D) STATE: CA 

(E) COUNTRY: U.S. 
(?) ZIP: 94304 

{V) COMPUTER READABLE FORM: 
(A) MEDIUM TYPE: Diskette 
(B} COMPUTER: IBM Compatible 

(C) OPERATING SYSTEM: DOS 

(D) SOFTWARE: FastSEQ Version l.S 

(VL) CURRENT APPLICATION DATA: 

(A) PCT APPLICATION NUMBER: To Be Assianec: 

(B) FILING DATE: Filed Herewith 

(vii\ PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: US 03/723, 6<1G 
(D) FILING DATE: 03-OCT-1996 

(viii ) ATTORNEY/AGENT INFORMATION: 

(A) NAME: Biiling.s, Lucy J. 

(B) REGISTRATION NUMBER: 36,749 

(C) REFERENCE/DOCKET NUMBER: IN-0001 PCT 

(ix) TELECOMMUNICATION INFORMATION: 
(A) TELEPHONE: 650-855-0555 
(3) TELEFAX: 650-845-4166 

(2) INFORC-LATION FOR SEQ ID NO : 1 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 ba.se pairs 

(B) TYPE: nucleic acid 

(C) STPANDEDNES5 : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: i : 
AATTCAGCTG GAG 13 
{2) INFORMATION FOR SEQ ID NO : 2 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LEblGTH: 9 base pairs 

(B) TYPE : nucleic acid 
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{O STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 : 
CTCCAGCTC 9 
(2} INFORMJXTION FOR GEQ ID NO: 3: 

{!) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: Al base pairs 

(B) TYPE: nucleic acid 
iC) STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(ii; MOLECULE TYPE: cDNA 

(:•:■:} GEQUEt.-CE DESCRIPTION: SEQ ID N0:3: 

CTGGAGNNNN NNNNNNNNNN NKNNNNNNNN NNNNNCTCCA G 4.1 

{ 2 } IN FORMAT I ON FOR 5 EQ I D UO : A : 

ii) SEQUENCE CHAR/yCTERISTICS : 

(A) LENGTH: 14 base pairs 
{3} TYPE: nucleic acid 

(C) STRANDEDNESS: sinqie 
iZi) TOPOLOGY: linear 

MOLECULE TYPE: cDNA 

(vii) IMMEDIATE SOURCE: 
(A) LIBR-ARY: GenBank 
(E) CLONE: 31340, 31342, 2Qn3A 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4 : 

CCGAGAGTCG TCGG 14 

(2; l^:FOR^:ATION FOR SEQ ID NO : 5 : 

{1} SEQUENCE CHARACTERISTICS: 
(Pv) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
iO) TOPOLOGY; linear 

(ii) MOLECULE TYPE: cDNA 
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(vii) IMMEDIATE SOURCF: : 
(A) LIBRARY: GenBank 
(3) CLONE: 31340 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

GCTGGAGCCG AGAGTCGTCG G 21 

(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 base pairs 

(B) TYPE: nucleic acid 
{C) STRANDEDNESS: single 
{D) TOPOLOGY: linear 

(ii) MOLECULE TYPE; cDNA 

(x.i) SEQUENCE. DESCRIPTION: SEQ ID no : 6 : 

AATTCAGCTG GAG ]j 

(2) INFORMATION FOR SEQ ID NO: 7: 

(1) SEQUENCE CHARACTERISTICS: 
(A; LENGTH: 35 base pairs 
{B} TYPE: nucleic acid 
CJ) STRANDEDNESS: single 
( ::■ > TO PO LOG Y : linear 

(li) MOLECULE TYPE: cDNA 

(xU SEQUENCE DESCRIPTION: SEQ ID NO : 7 : 
AATTCAGCTG GAGNNNNNNN NNNNNNNNNC AGCTG 35 
(2) INFORMATION FOR SEQ ID NO : B : 

(i) SF.QUENCF. CHARACTERISTICS: 
(A) LENGTH: 31 base pairs 
(R) TYPE: nucleic acid 

(C; STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 8 : 
GTCGACCTCN NNNNNNNNNN NNNNNGTCGA C 31 
(2) INFORMATION FOR SEQ ID NO : 9 : 

(il SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 9 : 
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CCCGAGGTCG ACTTAA 

(2) IMFORt-lATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STHJXNDEDNESS : singio 

(D) TOPOLOGY: linear 

^ii} (MOLECULE TYPE: cDN\^ 
(vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID MO: 10: 

CTGGAGNNNN NNNNNNNNNN NN 

[2) TNrORMATTON EOR SEQ ID NO: 11: 

{i; SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 20 base pairs 
(3) TYPE: r.ucieic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 11 : 

G AC C T C N N N N N N N N N N N N N N 

(;?) THFORf-tATION rOR SEQ ID NO: 12: 

ii) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 21 base pairs 
(3) TYPE: nucleic acid 
{O STRANDEDNESS : single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(>:i) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
GTTGAATACT CATACTCTTC C 

(2) INFORMATION FOR SEQ ID NO : 1 3 : 

[L] SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 
GCTGGCCTTT TGCTCACATG 

(2) INFORMATION FOR SEQ ID NO: 14: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 60 base pairs 

(B) TYPE: nucleic acid 
iC) STRANCeONESS : single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l^: 

GAATTCACCT GCAGGGACGT CTACTGAAGT ACTTAAGGAT GA/MGCCTGG AGCTCGCATC 

(2) INFORMATION FOR SEQ ID NO: lb: 

(i; SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

{ii) MOLECULE TYPE: cDNA 

(xi; SEQUENCE DESCRIPTION: SEQ ID NO: IS: 
TGTGCAGCTG ACGGCGCGCC GCGGCCGCA 

(2) INFORMATION FOR SEQ ID NO : I 6 : 

{i) SEQUENCE CHARACTERISTICS: 

(A) LENGTfi: 120 base pairs 

(B) TYPE: nucleic acid 
:C) STRANDEDNESS : sinaJe 
iJ) TOPOLOGY: linear 

Jli) MOLECULE TYPE: cDNA 

(xl; SEQUENCE DESCRIPTION: SEQ ID N0:16: 

CAACCACCCG GGCCCTCCAG CTGGAGGAAA AA/^TGCTAGG CTGGAGGGC''' GATCTTTTC^^ 
CTGGAGCTAG TTCTAGATCG CTGGAGCTGC GCCCGGCCCG GGC^CCGGC-S ATCCCTCCAC 

( 2) INFORjyiATION FOR SEQ ID NO : 1 7 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: lA base oairs 

(B) TYPE: nucleic acid 
(C; STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 

GGCCCGGGTG GTTG 

(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 
{A) LENGTH: 14 base pai^s 
(3) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
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(D) TOPOLOGY: linear 
(LL) MOLECULE TYPE: cONA 

(xi) SEQUENCE DESCRIPTION: 5EQ ID NO: 18: 

GAAAAAATGC TAGG 

[2] INFORMATION FOR SEQ :0 NO: 19: 

(i) SEQUEUCzl CHAR/iCTERISTIC:: : 
(A) LENGTH: I A base pair.: 
(B: TYPF.: nucleic acid 

(C) STRANDEDNESS: sinqle 
(D» TOPOLOGY: linear 

i'ii) MOLECULE TYPE: cDNA 

{xi: SEQUE;iCE DESCRIPTION: 5EQ ID NO: 19: 

GGCTGATGTT TTCC 

[2) INFORMATION FOR SEQ ID NO: 20: 

(1) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: ^A base pairs 
( E ) TYPE: nucleic a c i ci 
(C; STRANDEDNESS: single 

(D) TOPOLOGY: linear 

( i i ) MOLEC'JLE TYPIl: cDNA 

SEQUENCE DESCRIPTION: SEO ID NO:20: 
CTAGTTCTAG ATCG 

( 2 ) I N FORC^\T : ON FO R S EQ I O NO : 2 I : 

(i; SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 base pairs 

(B) TYPE: nucleic acid 
{C; STRANDEDNESS: single 
{0; TOPOLOGY: linear • 

(ii) MOLECULE TYPE: cDNA 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 1 : 
CTGCGCCCGG CCCG 

(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 
(AJ LENGTH: 14 base pairs 

(B) TYPE: nucleic acid 

(C) STR.?\NDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:22: 
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GGATAGCCGG TCCC i '1 

(2) INFORMATION FOR SEQ ID NO : 2 3 : 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 93 base pairs 

(B) TYPl::: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOE^OLOGY: linear 

Hi) MOLECULE TYPE: cCNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:23: 

CATGTGCGGC CGCGGCGCGC CGTCAGCTGC ACAGATGCGA GCTCCAGGCA TTCATCCTTA 60 
AGTACTTCAG TAGACGTCCC TGCAGGTGAA TTC 93 

(2) INFORMATION FOR SEQ ID N0:2.'l: 

(i) GEQ-JENCE CHARACTERISTICS: 

(A) LENGTH: 89 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: si.ng-lo 
{D) TOPOLOGY: linear 

[ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2.;: 

GAATTCACCT GCAGGGACGT CTACTGAAGT ACTTr'AGGAT ZfKA'IGCCTGG AGCTCGCATC 60 
TGTGCAGCTG ACGGCGCGCC GCGGCCGCA S9 

(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(li) MOLECULE TYPE: cDNA 

{>:i) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 

(2) INFOR^4ATION FOR SEQ ID NO : 2 6 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(XL) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 
AAAAAAAAJVA AAAA 14 
(2) INFORMATION FOR SEQ ID NO: 27: 
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(1) sequence: CHARACTERISTICS: 
{A} LENGTH: 16 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEOUENCH: DESCRIPTION: SEQ ID NO: 27: 
TCGACTAGTA CGAAGA 

(2) INFORMATION FOR SEQ ID NO: 23: 

( i ) SEQUENCE CKABACTERI STICS : 

(A) LENGTH: 14 base pairs 

(B) TYPE: aucleic acid 

(C) STRANDEDNESS: single 
{.">) TOPOLOGY: linear 

(ii} MOLECULE TYPE: cDNA 

(xi; SEQUENCE DESCRirTION: SEQ ID NO: 29: 
CGTCTTCGTA CTAG ^ 
(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 12 baso pairs 
{13) TYPE: nucleic acid 

(C; 3TRAN0EDNESS : single 
{Dj TOPOLOGY: linear 

(ii) MOLECULE 'TYPE: cDNA 

(:<i) SEQUENCE DESCRIPTION: SEQ I'O NO:29: 
GAAGACGCNN NN 

(2) INFORKiATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 12 base pairs 

(B) TYPE: nucleic acid 

(C) STEU^NDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID WC:30: 
NNNNGCGTCT TC 

(2) INFORM?\TION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 11 base pairs 
(5) TYPE: nucleic acid' 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(li) M.OLECULH: TYPE: CDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 
CATGATCATG G 11 
:?.) INFORMATION FOR 3EQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 13 base pairs 
(B'i TYPE: nucleic acid 

(C) STRANDEDNESS: siaqie 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

{y.i} SEQUENCE DESCRIPTION: SEQ ID NO:37: 
GGCCATGATC ATG 13 
(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 
{C) STRANDEDNESS: sinqle 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xl) SEQUENCE DESCRIPTION: SEQ ID NO: 33: 
GAAGACGCNN NNCATGATCA TGG 23 
[2) INFORMATION FOR SEQ ID NO:34: 

(i; SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(Xi) SEQUENCE DESCRIPTION: SEQ 'ID NO: 34: 
GGCCATGATC ATGNNNNGCG TCTTC 25 
(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 
{C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 
GGCCATGATC ATGNNNNNNN NN 2 2 
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(2) INFORMATION FOR SF.Q ID NO: 36: 

(i) SEQuENCi:: CHARACTERISTICS: 
(A} LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
(0) TOPOLOGY: linear 

111) MOLECULIi: type:: cDNA 

(xi) sequence: DESCRIPTION: SEQ ID NO: 36: 
NNNNNNNNNC ATGATCATGG 20 
(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCii CHAPv/vCTERISTICS: 

(A) LENGTH : 21 base pairs 

(B) TYPE: r.ucLciic acid 

(C) STRANDEDNESS: single 
vD) TOPOLOGY: linear 

(ii) molecule: TYP£: cDNA 

( :-: i ; S EQU ENCE DESC R I FT I ON : 3 EQ I D NO : 3 7 : 
GAAGACGCNN NNCATGATCA T 21 

(2) INFORMA.TIOM for SEQ ID NO: 38: 
(ii SEQUENCE C:IAPACTFRI3TICS : 



(A 
(B 
(C 



LENGTH: 2 5 base pairs 
TYPE: n-jclcic acid 
STPA.NDEDNESS : single 



(D) TOPOLOGY: linear 
;ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: 
GGCCATGATC ATGNNNNGCG TCTTC ?,b 
(2) INFORMATION FOR SEQ ID NO: 39: 

(J.) SEQUENCE CHAFIACTERISTICS : 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(li) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 

GGCCATGATC ATGNNNNNNN NN 22 

(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CH AFIACTERI ST ICS : 
(A) LENGTH: 13 base pairs 
(3) TYPE: nucleic acid 
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(C; STRANDEDNESS: singio 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SHQUENCH: DESCRIPTION: SEQ ID NO: 40: 
NNNNNNNNNC ATGATCAT ^ ^ 

f2) INFORMATION FOR SEQ ID NO : 4 1 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 
(C: STRANDEDNESS: single 
[U) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(::i) SEQUENCE DESCRIPTION: SEQ ID NO : 4 1 : 
ATCATGGCCA TGATCATGNN NN 2 2 

::') INFORMATION FOR SEQ TO NO: 4 2: 

(ij SEQUENCE CHARACTERISTICS: 

(A) LENGTH: bl base pairs 

(B) TYPE: nucleic acid 
(C; STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(l.i; MOLECULE TYPE: cDNA 

(x:; SEQUENCE DESCRIPTION: SEQ ID NO: 42: 

GAAGACGCNN NNCATGATCA TGGCCATGAT CATGNNNNNN NNNNNNNNNN N 51 

(2) INFORMATION FOR SEQ ID NO: 43: 

(i; SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 51 base pairs 
:3) TYPE: nucleic acid 
:C) STRANDEDNESS: single 
:D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(>::.) SEQUENCE DESCRIPTION: SEQ ID NO : 4 :^ : 

NNNNNNNNNN NNNNNNNCAT GATCATGGCC ATGATCATGN NNNGCGTCTT C 51 

(2) INFORMATION FOR SEQ ID NO: 44: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 22 base pairs 
{B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
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SSQUENCH: DESCRIPTION: SZQ ID MO : '1 4 : 
ATCATGCCCA TGATCATGNN N*M 

(2) INFORMATION TOR SEQ ID MO : 4 5 : 

[i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2? base pairs 

(B) TYPE: nucleic acxd 
iC) 3TRANDEDNESS: smqie 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45: 

ATCATGGCCA TGATCATGAG CA 22 

(2) INFORMATION FOR SEQ ID t\6:^6: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTti: 22 base pairs 
{R; TYPE: nucleic acid 
{C: STRANDEDNESS: single 
(D) TOPOl.-OCY : Linear 

(ii; MOLECULE TYPE: cDNA 

(Ml) SEQUENCE DESCRIPTION: SEQ ID NO : -U: : 
ATCATGGCCA TGATCATGTC GT 22 
(2) INFORMATION FOR SEQ TD HO: 47: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 22 base pairs 
iB) TYPE: nucleic: acid 

(C) STRANDEDNESS: smqie 
(D; TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 47: 
ATCATGGCCA TGATCATGAG CA ^2 
(2) INFORMATION FOR SEQ ID NO: 48: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LFNGTfi: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SfclQUENCE DESCRIPTION: SEQ ID NO: 48: 
ATCATGGCCA TGATCATGTC GT 22 

{2} INFORMATION FOR SEQ ID NO : 4 9 : 
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(j. ) SEQUENCH: CHAElACTERISTICb- : 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY; linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 9 : 

ATCATGGCCA TGATCATGAG CA 

(2) INFORMATION FOR SEQ ID NO: 50: 

(L) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 22 base pairs 
(3) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

MOLECULE TYPE: cDNA 
(x.L) SEQUENCE DESCRIPTION: SEQ ID NO: 50: 
ATCATGGCCA TGATCATGTC GT 
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Claims: 

1 . A method for generating and analyzing transcript markers from the 5* most end 
of individual cDNAs of a cDNA librarx' comprising the steps of: 

a) obtaining a cDNA library comprising individual cDNAs having a first 
5 restriction endonuclease site for a restriction endonuclease that digests the cDNA at the 5' 

most end within an expected distance from its recognition site and a second endonuclease 
restriction site; 

b) subjecting the cDNAs to digestion with the first and the second 
restriction endonuclease thereby excising transcript markers from the 5' most end of the 

10 individual cDN As; 

c) ligaiing said transcript markers to a vector; 

d) transfomiing said vector containing the transcript markers in a host cell 
and culturing said host cells; and 

c) performing nucleic acid sequence analysis of the transcript markers. 
1 5 2. The method of Claim I further comprising the step of creating blunt ends on the 

5' transcript markers after step b, 

3. The method of Claim I wherein said first restriction enzyme is a Type IIS 
restriction enzyme. 

4. The method of Claim 3 wherein the 1 ype IIS restriction enzyme is selected from 
20 the group consisting of BpmL Bsgl, Eco571 and BsmFL 

5. The method of Claim 1 wherein said cDNA library has been constructed by 
normalization techniques. 

6. The method of Claim 1 wherein saidcDNA library has been constructed using 
random primers. 

25 7. The method of Claim 1 wherein said cDNA library has been constructed using 

olio dT primers. 

8. The method of Claim I wherein the cDNA library has been constructed from 
mRNA treated with a demethylating agent. 

9. The method of Claim 8 wherein the demethylating agent is 5-aza-2' 
30 deoxycytidine. 

10. The method of Claim 1 wherein digesting the cDNA library with the second 
restriction enzyme creates a blunt end. 
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1 1. The method of Claim I optionally comprising the step of concatenating said 5' 
transcript markers after step b) thereby forming a serial arrangement of multiple 5' 
transcript markers. 

12. The method of Claim 1 1 optionally comprising the step of amplifying the serial 
5 arrangement of multiple 5' transcript markers by polymerase chain reaction prior to ligating 

to a vector. 

13. A method for determining a representation of gene expression in a cDNA 
library comprising the steps of: 

a) obtaining a cDN A library comprising individual cDNAs having a first 
10 restriction endonuclease site for a restriction endonuclcase that digests the cDNA at the 5' 

most end within an expected distance from its recognition site and a second endonuclease 
restriction site and wherein the cDNA library has been grown under non-amplified growih 
conditions; 

b) generating and isolating 5' transcript markers from the cDNA librar> 
15 wherein each isolated 5' transcript marker identifies an individual cDNA: 

c) ligating the isolated 5' transcript markers to a vector and transforming the 
vector in a host cell; 

d) culturing said host cells: 

e) performing nucleic acid sequencing on the 5' transcript markers in said 
20 transformed host cells to create a nucleic acid data set; and 

f) subjecting the data set to analysis to determine the relative abundance of 
individual 5' transcript markers thereby determining a representation of gene expression in 
the cDN A library. 

14. The method of Claim 13 comprising concatenating the isolated 5' transcript 

25 markers after step b) thereby creating a serial arrangement of multiple 5' transcript markers. 

15. The method of Claim 13 wherein said cDNA library has been constructed with 
olio dT primers. 

16. The method of Claim 1 3 wherein after step b) the isolated 5' transcript markers 
are blunt ended. 

30 17, A method for rapid nucleic-acid sequence analysis of cDNA comprising the 

steps of: 

a) obtaining a cDNA library comprising individual cDNAs having a first 
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restriction endonuclease site for a restriction endonuclease that digests the cDNA at the 5' 
most end within an expected distance from its recognition site and a second endonuclease 
restriction site; 

b) subjecting the cDNA library to digestion with the first restriction 

5 endonuclease and the second restriction endonuclease to create 5' transcript markers which 
identify individual cDNAs; 

c) isolating said 5* transcript markers: 

d) concatenating said 5' transcript markers to create a serial arrangement of 
multiple transcript markers; 

10 e) ligating said serial arrangement of multiple transcript markers to a vector 

and transforming said host cell with the vector; 

0 performing nucleic acid sequencing analysis on said serial arrangement of 
5* transcript markers and identifying transcript markers; 

g) preparing a first and a second PCR primer, wherein the first PCR primer 
15 is designed from a transcript marker identified in step 0 iind the second PCR primer is 

designed from a section of nucleic acid common to all cDNAs in the cDNA library^ 

h) subjecting the cDNA library to a polymerase chain reaction using the 
primers of step g) thereby identifying the cDNA designated by a transcript marker: and 

i) performing nucleic acid sequencing on the identified DNA. 

20 18. The method of Claim 17 wherein wherein the polymerase chain reaction of step 

h) is performed in 96 well plates. 

19. The method of Claim 1 7 optionally comprising the step of amplifying the serial 
arrangement of multiple 5' transcript markers by polymerase chain reaction after step e). 

20. The method of Claim 1 7 further comprising the step of creating blunt ends on 
25 the 5' transcript markers after step c. 

21 . The method of Claim 1 7 wherein said first restriction enzyme is a Type IIS 
restriction enzyme. 

22. The method of Claim 1 7 wherein the Type IIS restriction enz>'me is selected 
fi-om the group consisting of Bpml, BsgL Eco57I and BsmFI 

30 23. The method of Claim 1 7 wherein said cDNA library has been constructed by 

normalization techniques. 

24. The method of Claim 1 7 wherein said cDNA library has been constructed using 
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random primers. 

25. The method of Claim 17 wherein said cDNA library has been constructed using 
olio dT primers. 

26. The method of Claim 17 wherein the cDNA library has been constructed from 
5 mRNA treated with a demethylating agent. 

27. The method of Claim 26 wherein the demethylating agent is 5- aza-2'- 
deoycytidine. 

28. The method of Claim 1 7 wherein digesting the cDN A library with the second 
restriction enzyme creates a blunt end. 

10 29. A method for the rapid, sequence-specific identification of cDNAs derived from 

a human mRNA population, comprising the steps of: 

a) obtaining a cDNA library comprising individual cDNAs wherein said 
cDN As contain Tirst restriction endonuclcasc sites for an endonuclease having a 4 base pair 
recognition site and wherein said cDNAs are cloned into a first vector lacking the first 

15 restriction endonuclease sites; 

b) subjecting the cDNA library to digestion with the first restriction 
endonuclease thereby creating linearized cDNAs containing a portion of the original 
cDNA; 

c) ligating an adapter to said linearized cDNAs wherein the adapter contains 
20 a second restriction endonuclease site for an endonuclease that cleaves within an expected 

range of its recognition site thereby creating cDNAs containing two non-contiguous 
transcript markers joined by the adapter; 

d) digesting the cDNAs of step c) with the second restriction endonuclease 
thereby excising the transcript markers from said cDN As; 

25 e) concatenating said excised transcript markers; 

f) ligating said concatenated transcript markers to a second vector; 

g) transforming said second vector containing the transcript markers in a 
host cell and culturing said host cells; and 

h) performing nucleic acid sequence analysis on the transcript markers in 

30 said host cells. 

30. The method of Claim 29 optionally comprising concatenating the transcript 
markers after step e) to create a serial arrangement of multiple transcript markers. 
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3 1 . The method of Claim 2^ wherein the cDNA hbrary has been constructed from 
mRNA treated with a demethylating agent. 

32. The method of Claim 29 wherein the demethylating agent is 5-aza-2' 
deoxycytidinc. 

5 33. The method of Claim 29 wherein the cDNA library has been constructed by 

normalization techniques. 

34. The method of Claim 29 wherein the second endonuclease is Bcgl. 

35. The method of Claim 29 wherein the nucleic acid sequence analysis comprises 
amplifying transcript markers in an outward manner using 2 specific primers and 

10 sequencing directly using a wobble primer. 

36. The vector as shown in Figure 13. 

37. A method for generating and analyzing non-contiguous transcript markers 
derived from the 5' most and 3' most end of individual cDNAs, comprising the steps of: 

a) obtaining a cDNA library comprising individual cDNAs having a firsr 
15 restriction endonuclease site at the 5' most end and at the 3' most end for a restriction 

endonuclease that digests nucleic acid within an expected distance from the first 
endonuclease recognition site, and a second endonuclease restriction site; 

b) subjecting the cDNAs to digestion with the first endonuclease thereby 
creating linearized cDNAs containing transcript markers from the 5' most end and the 3' 

20 most end of the individual cDNAs; 

c) self-ligating said linearized cDNAs thereby joining the transcript markers 
from the 5' most and 3' most ends to create cDNAs containing non-contiguous transcript 
markers; 

d) transforming said linearized cDNAs in a host cell and culturing said host 

25 cells: 

e) isolating said cDNA from said host cells: 

0 digesting the cDNA with the second restriction endonuclease thereby 
excising the non-contiguous transcript markers; 

g) ligating said excised transcript markers to a second vector: 

30 f) transforming said second vector containing the transcript markers in a 

host cell and culturing said host cells: and 

h) performing nucleic acid sequence analysis of the transcript markers. 
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38. The method of Claim 37 further comprising creating blunt ends on the 5' 
transcript markers after step b. 

39. The method of Claim 37 wherein said first restriction enzyme is a Type IIS 
restriction enzyme. 

5 40. The method of Claim 37 wherein the Type IIS restriction enzyme is selected 

from the group consisting of BpmL BsgL Eco57I and BsmFI. 

41 . The method of Claim 37 wherein said cDNA library has been constructed by 
normalization techniques. 

42. The method of Claim 37 wherein said cDNA library has been constructed using 
10 random primers. 

43. The method of Claim 37 wherein said cDNA library has been constructed using 
olio dT primers. 

44. The method of Claim 37 wherein the cDNA library has been constructed from 
mRNA treated with a demethylating agent. 

15 45. The method of Claim 44 wherein the demethylating agent is 5-a/.a-2' 

deoxycytidine. 

46. The method of Claim 37 wherein digesting the cDNA library with the second 
restriction enzyme creates a blunt end. 

47. The method of Claim 37 comprising concatenating said 5' transcript markers 
20 after step b) thereby forming a serial arrangement of multiple 5* transcript markers. 

48. The method of Claim 47 comprising amplifying the serial arrangement of 
multiple 5' transcript markers by polymerase chain reaction prior to ligaiing to a vector. 
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